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In this article , the authors consider the following question about Huffman coding, 
which is an important technique for compressing data from a discrete source. If p 
is the smallest source probability, how long, in terms of p, can the longest Huffman 
codeword be? It is shown that if p is in the range 0 < p < 1/2 , and if K is the 
unique index such that I/Fk +3 < p < 1/F k-h2> where Fk denotes theKth Fibonacci 
number , then the longest Huffman codeword for a source whose least probability is 
p is at most I\, and no better bound is possible. Asymptotically , this implies the 
surprising fact that for small values of p, a Huffman code’s longest codeword can 
be as much as 44 percent larger than that of the corresponding Shannon code. 


I. Introduction and Summary 

Huffman coding is optimal (in the sense of minimiz- 
ing “average cod word length) for any discrete memoryless 
source, and Huffman codes are used widely in data com- 
pression applications. In many situations it would be use- 
ful to have an easy way to estimate the longest Huffman 
codeword length for a given source, without having to go 
through Huffman’s algorithm, but since there is no known 
closed-form expression for the Huffman codeword lengths, 
no such estimate immediately suggests itself. However, 
since the longest codeword will always be associated with 
the least-probable source symbol, one way to address this 
problem is to ask the following question: If p is the smallest 
source probability, how long, in terms of p, can the longest 
Huffman codeword be? It turns out that this quantity, de- 


noted by L(p ), is easy to calculate, and so L(p) provides an 
“easy estimate” of the longest Huffman codeword length. 

The formula for L(p) involves the famous Fibonacci 
numbers (F n )„> o> which are defined recursively, as follows: 

F 0 = 0 ,F X = 1, and F n = F n . x + F n _ 2 for n > 2 (1) 

Thus, F 2 = 1, F s = 2, F 4 = 3, F 5 = 5, F 6 = 8, etc. The 
Fibonacci numbers and their properties are discussed in 
detail in [1, Section 1.2.8]. Here is the main result of this 
article. (Note that since the definition of L(p) assumes p 
to be the smallest probability in a source, p must lie in the 
range 0 < p < 1/2.) 
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Theorem 1. Let p be a probability in the range 0 < 
p < 1/2, and let K be the unique index such that 


Fk + 3 <P< Fk + 2 


Then L(p) = K. Thus p € (1/3, 1/2] implies L(p) = 1, 
p € (1/5, 1/3] impies L(p) = 2, p G (1/8, 1/5] implies 
L(p) = 3, etc. 

It is easy to prove by induction that the Fibonacci num- 
bers satisfy the following inequalities: 

(j , n - 3 < F„ < 0" -1 for n > 3 (3) 

where 4 = (l+V5)/2 = 1.618 ... is the “golden ratio.” By 
combining inequality (3) with Theorem 1, one sees that 


l°g* - - 2 < L(p) < log, - 
p V 


which, in turn, implies that 


lim = 1 

p ~° log* , 


Since log, x = (log 2 x)/(log 2 4>) = 1.44041og 2 x, Eq. (5) 
implies the surprising fact that for small values of p, a Huff- 
man code’s longest codeword can be as much as 44 percent 
larger than that of the corresponding (in general, subopti- 
mal) Shannon code [2, Chapter 5], which assigns a symbol 
with probability p a codeword of length flog 2 -] . 

Theorem l is closely related to a result of Katona and 
Nemetz [4], which identifies the length of the longest pos- 
sible Huffman codeword for a source symbol of probability 
p (whether or not p is the smallest source probability). 
Denoting this quantity by L*(p), their result is as follows: 

Theorem 2. (Katona and Nemetz [4]) Let p be a prob- 
ability in the range 0 < p < 1, and let K be the unique 
index such that 


TT— <P< p W 

Fk+ 2 Fk+1 

Then L*(p) = K. Thus, p € [1/2, 1) implies L*{p) = 1, 
p € [1/3, 1/2) implies L‘(p) = 2, p € [1/5, 1/3) implies 
Z,*(p) = 3, etc. 


By comparing Theorems 1 and 2, one sees that L (p) — 
L(p) + 1 unless p is the reciprocal of a Fibonacci number, 
in which case L*(p) = L(p). 1 


II. Proof of Theorem 1 

The proof of Theorem 1 is in two parts. First, it will be 
shown that if p > l/F K + 3 , then in any Huffman code for 
a source whose smallest probability is p, the longest code- 
word length is at most K. In fact, a considerably stronger 
result will be proved. The class of efficient prefix codes 
will be defined, and it will be shown that any Huffman 
code, and in fact any optimal code for a given source, is 
efficient. Then it will be shown that if p > 1/ Fk+ a, m any 
efficient code for a source whose smallest probability is p, 
the longest codeword length is at most K . In the second 
half of the proof, it will be shown that if p < 1/ F K + 2 , there 
exists a source whose smallest probability is p, which has 
at least one Huffman code whose longest word has length 
K, As an extension, it will be seen that if p < I/Fk+ 2 , 
there exists a source whose smallest probability is p, and 
for which every optimal code has the longest word of length 
K. (If p = I/Fk+ 2 , however, there is no such source.) 

Now comes the definition of efficient prefix codes, which 
is best stated in terms of the associated binary code tree 
(see Fig. 1). Each source symbol and its corresponding 
codeword is associated with a unique terminal node on 
the tree. Also, each node in the tree is assigned a proba- 
bility. The probability of a terminal node is defined to 
be the probability of the corresponding source symbol, 
and the probability of any other node of the code tree 
is defined to be the sum of the probabilities of its two 
“children.” The level of the root node is defined to be 
zero, and the level of every other node is defined to be 
one more than the level of its parent. Two nodes de- 
scended from the same parent node are called siblings. 
Figure 1 shows two different code trees for the source 
[3/20,3/20,3/20,3/20,8/20]. The tree in Fig. 1(a) cor- 
responds to the prefix code {000,001,01,10,11}, and the 
tree in Fig. 1(b) corresponds to {000,001,010,011,1}. 

Definition. A prefix code for a source S is efficient if 
every node except the root in the code tree has a sibling, 
and if level(v) < level(u') implies p(v) > p(t/). 

1 In fact, however, if one were to make a subtle change in the defi- 
nition of L(p), this special case would disappear. The change re- 
quired is to define L(p) as the minimum maximum Huffman code- 
word length over all Huffman codes for a source with p as the least 
probability, where the outer minimum is over all Huffman codes 
for a given source. 
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Gallager [3] noted that every Huffman tree is efficient, 
but in fact it is easy to see more generally that every op- 
timal tree is efficient. This is because in an inefficient 
tree, with nodes v and v f such that level(t;) < level(t/) 
but p(r) < p(i/ ), by interchanging the subtrees rooted 
at v and i/, one arrives at a new code tree for the same 
source, whose average length has been reduced by ex- 
actly (level(t/) - level(v))(p(t/) - p(v)). However, it is 
not true that every efficient code is optimal. Indeed, 
Fig. 1 shows two different efficient code trees for the source 
[3/20,3/20,3/20,3/20,8/20]. The code in Fig. 1(b) is op- 
timal, but the one in Fig. 1(a) is not. 

Theorem 3. If p > l/F/c+a, then in any efficient prefix 
code for a source whose least probability is p, the longest 
codeword length is at most K. 

Proof: The contrapositive will be proved, i.e., if p is the 
least probability in a source that has an efficient prefix 
code whose longest word has length > K - hi, then p < 
l/f^+3- 

Thus, suppose that S is a source whose least proba- 
bility is p and that there is an efficient prefix code for S 
whose longest word is of length > K + 1. In the code 
tree for this code, there must be a path of length K 4-1 
starting from the terminal node, which corresponds to the 
longest word and moves upward toward the root. This 
path is shown in Fig. 2 as the path whose probabilities are 
p 0 ,pi,... ,pk+ i- Since the code is assumed to be efficient, 
each of the vertices in this path (except possibly the top 
vertex) has a sibling; these siblings are shown in Fig. 2 as 
having probabilities <?o, 0i» • ■ • > Now one can prove the 
following: 


Pi>Fi+iP for i = 0,1,..., 1C + 1 (7) 


The proof of (7) is by induction. For i = 0, (7) merely says 
that po > p, which is true since po = p, by definition. Also, 
note that qo > p since p is the least source probability. 
Thus, pi = po + qo > p + p = 2p = F 3 p, which proves 
(7) for i = 1. For i > 2, one has p, = p,'-i + ft- But 
p t -_ i > Ft+ip by induction, and ft_i > p *_ 2 since the 
code is efficient (ft-i is a higher level node than p,- 2 )- 
Thus, one has ft_i > p*_ 2 > Fip by induction, and so 
Pi = p»- 1 + ft-i > (Fi+ 1 + Fi)p = F i+2 p , which completes 
the proof of (7). 

Now consider the probability pjc+i. On one hand, 
px + 1 < 1; but on the other hand, pk+ 1 > F#+ 3 p, by 
(7). Thus, p < I/Fk+ 3 , which completes the proof. □ 


Theorem 4. If p < l/FV+ 2 , there exists a source 
whose smallest probability is p and which has a Huffman 
code whose longest word has length K. If p < 1 /FV+ 2 ) 
there exists such a source for which every optimal code has 
a longest word of length K. 

Proof: Consider the following set of K + 1 source proba- 
bilities: 


Fi F2 Fk- 1 Fx + 1 

Fk+2* Fk+2 1 ’F/r+s’ ^+2 


( 8 ) 


Note that p is the minimal probability for this source, since 
p < I/Fk+i = Fi/Fk+2' Now, consider the code tree for 
this source depicted in Fig. 3, which assigns the source 
probability p a word of length K. This tree is in fact a 
Huffman tree for these probabilities, i.e., a code tree that 
arises when Huffman’s algorithm is applied to the source 
of (8). To see this, one first proves that the internal vertex 
probabilities pi in Fig. 3 are given by the following formula: 


Pi = Fi+i/F/c+i-h, for i = 0,1 K- 1 (9) 


Pk - 1 


( 10 ) 


where h = \/Fk+i — P- 

To prove (9), one uses induction. For i — 0, by def- 
inition, po = p = 1 /Fk+ 2 - h — Fi/Fk+i — h. For 
i > 1, one then has p< = p,_i + Fi/Fx+i = (•F’.+iAFtf+i “ 
h) + Fi/F K + 2 = F i+2 /F K+ i - h. To prove (10), note 
that p K = Pk- i + ( F k + 1)/F k +2 - P- But from (9), 
Pk- i = ( Fk+i/Fk+2 — Mi 80 fhat pk = (Fk+i/Fk+ 2 — 
h ) + ( Fk/Fk +2 + h) = F k + 2 /F k +2 = 1. Thus the prob- 
abilities in (8) sum to one. 

It now follows that the tree in Fig. 3 is a Huffman tree, 
for from (9) one sees that at the *th stage (i = 0, . . . , AT— 1), 
the “collapsed” source consists of the probabilities 


[Fi+2/Ff(+2 — Fi+i/ F k+ 2, Fi+ 2 / Fk+2> - - - , 

Fk-i/Fk+2) Fk/ Fk+2 + h] ( 11 ) 

Plainly the two leftmost probabilities in (11), namely 
Fi+i/Ffc+i - h and Fi+x/Fx+ 2 , are two of the smallest 
probabilities, and so the tree of Fig. 3 is a Huffman tree, 
as asserted. 

Finally, note that if h > 0, i.e., if p < I/Fjc+ 2 ) that 
the leftmost two probabilities in (11) are uniquely the two 
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smallest probabilities in the list, so that the Huffman tree 
in Fig. 3 is the unique Huffman tree for the source of 
Eq. (8). And since the set of codeword lengths in any 
optimal code is the same as the set of lengths in some 
Huffman code, the last statement in Theorem 4 follows. 

□ 

By combining Theorems 3 and 4, one obtains a result 
that is stronger than Theorem 1. 

Example 1: Let p = 2“ 8 . Then I/F14 = 1/377 < V < 
I/F13 = 1/233, and so by Theorem 1, L(2~ 8 ) = 11. More 
concretely, Theorem 3 shows that no Huffman code for a 
source whose smallest probability is 2 8 can have a code- 
word whose length is longer than 11. By Theorem 4, on 
the other hand, every optimal code for the source 


proofs are entirely similar to the proofs of Theorems 3 
and 4. 

Theorem 5. Let 5 be a source containing a symbol a 
whose probability is p. If p > 1/ 2> then in any efficient 
prefix code for 5, the length of the codeword assigned to 
the symbol a is at most K . 

Theorem 6. Let p < l/F K+i . Then there exists a 
source S containing a symbol a whose probability is p, and 
such that every optimal code for S assigns a a codeword 
of length K. Explicitly, one such source is given by 

1 Fl F2 Fk ~ 1 i r 

Fk+i P f,P ’ Fk+i’ Fk+i”"' Fk+i J 

(13) 


_ 8 i i _L JL JL JL ii JL 

2 ’233’ 233’ 233’ 233’ 233’ 233’ 233’ 233’ 

34 55 90 

233’ 233’ 233 



where c is any real number such that 0 < c < 1/ Fk+ 2 ~ P- 

Example 2: Let p = 2 -8 . Then I/T14 = 1/377 < P < 
1 / F 13 = 1/233, and so by Theorem 2, L*(2~ 8 ) = 12. In- 
deed, by Theorem 6, every optimal code for the source 


has a longest word of length 11. 


□ 



9 -s f 2-8 1 1 11 1 1 

1 ’ ’ 233’ 233’ 233’ 233’ 233’ 233’ 


III. Extension of the Katona-Nemetz 
Theorem 

In this section, two theorems are stated without proof. 
When taken together, they yield a result that is slightly 
stronger than Katona and Nemetz’s Theorem 2. The 


ii ii ii ii — + e] (14) 

233’ 233’ 233’ 233’ 233 J 

where 0 < e < 1/233 - 1/256, assigns the symbol with 
probability 2 -8 a codeword of length 12. □ 
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pig_ -|_ Two code tree# for the source [3/20, 3/20, 3/20, 3/20, 3/20]. 

(a) a tree that la efficient but not optimal (average length = 2.3) and 

(b) a tree that la optimal (average length = 2.2). 
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Fig. 2. A portion of an efficient code tree, In which the longest 
codeword has length 2 K ♦ 1. p 0 is the least source probability. 
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Fig. 3. A Huffman code tree for the source in (8). Us smallest 
probability is p, where p £ 1/F K+ * and Its longest codeword 
length Is K. 
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